首页> 外文OA文献 >Optimal and Approximate Q-value Functions for Decentralized POMDPs
【2h】

Optimal and Approximate Q-value Functions for Decentralized POMDPs

机译:分散pOmDp的最优和近似Q值函数

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

Decision-theoretic planning is a popular approach to sequential decisionmaking problems, because it treats uncertainty in sensing and acting in aprincipled way. In single-agent frameworks like MDPs and POMDPs, planning canbe carried out by resorting to Q-value functions: an optimal Q-value functionQ* is computed in a recursive manner by dynamic programming, and then anoptimal policy is extracted from Q*. In this paper we study whether similarQ-value functions can be defined for decentralized POMDP models (Dec-POMDPs),and how policies can be extracted from such value functions. We define twoforms of the optimal Q-value function for Dec-POMDPs: one that gives anormative description as the Q-value function of an optimal pure joint policyand another one that is sequentially rational and thus gives a recipe forcomputation. This computation, however, is infeasible for all but the smallestproblems. Therefore, we analyze various approximate Q-value functions thatallow for efficient computation. We describe how they relate, and we prove thatthey all provide an upper bound to the optimal Q-value function Q*. Finally,unifying some previous approaches for solving Dec-POMDPs, we describe a familyof algorithms for extracting policies from such Q-value functions, and performan experimental evaluation on existing test problems, including a newfirefighting benchmark problem.
机译:决策理论规划是解决顺序决策问题的一种流行方法,因为它以原则的方式处理感知和行动中的不确定性。在MDP和POMDP之类的单代理程序框架中,可以通过使用Q值函数来进行计划:通过动态编程以递归方式计算最优Q值函数Q *,然后从Q *中提取最佳策略。在本文中,我们研究了是否可以为分散的POMDP模型(Dec-POMDPs)定义类似的Q值函数,以及如何从这些值函数中提取策略。我们为Dec-POMDP定义了最优Q值函数的两种形式:一种给出了对最优纯联合策略的Q值函数的描述,而另一种则是顺序合理的,从而给出了计算公式。但是,除了最小的问题外,这种计算是不可行的。因此,我们分析了允许有效计算的各种近似Q值函数。我们描述它们之间的关系,并证明它们都为最优Q值函数Q *提供了上限。最后,结合以前解决Dec-POMDP的一些方法,我们描述了一系列从此类Q值函数中提取策略的算法,并对现有的测试问题(包括新的消防基准问题)进行了实验评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号